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ABSTRACT 


For  the  infinite  user  population  multiple  access  communication  channel 
in  which  the  number  of  colliding  packets  is  known  exactly,  it  has  been 
shown  previously  that  it  is  possible  to  achieve  a  throughput 
arbitrarily  close  to  one.  This  result  is  examined. 

A  particular  two-step  problem  formulation  is  described.  In  this 
strategy,  the  time  axis  is  divided  into  many  small  non-overlapping 
segments.  By  enabling  subsets  of  this  set  of  segments,  collisions  are 
generated  to  learn  the  number  of  data  packets  in  each  segment.  In  the 
second  stage,  each  segment  found  to  contain  one  or  more  data  packets  is 
resolved  by  splitting. 


It  is  shown  that  the  problem  of  identifying  the  number  of  packets  in 
each  segment  given  the  collision  results  is  NP  complete.  A  tradeoff 
between  complexity  and  throughput  is  described.  The  number  of  packets 
in  the  backlog  is  lower  bounded  in  terms  of  attainable  throughput.  The 
bound  indicates  that  achieving  high  throughput  requires  an  enormous 
amount  of  computation  and  very  large  delay,  indicating  that  all 
strategies  of  this  type  are  essentially  useless  for  obtaining  high 
throughput.  _ 


Thesis  Supervisor:  Robert  G.  Gallager 

Title:  Professor  of  Electrical  Engineering 
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A  multiaccess  communication  channel  consists  of  a  group  of 
transmitting  stations  sending  and  receiving  data  packets.  The 
fundamental  problem  associated  with  these  channels  is  to  find 
algorithms  for  the  efficent  transmission  of  messages.  Of  course,  to 
solve  this  problem  it  is  necessary  to  specify  a  particular  model  of  the 
communication  channel.  Typically  the  following  assumptions  are  made: 

1.  Arrivals  of  data  packets  form  a  Poisson  process  with  an 
overall  arrival  rate  X  to  the  set  of  transmitters. 

2.  There  are  an  infinite  number  of  stations.  So,  by  onr  first 
assumption,  this  restriction  implies  that  over  a  finite 
lifetime,  a  particular" station  will  send  -no  more  than  one 
packet.  This  precludes  the  use  of  time  division 
multiplexing  and  focuses  attention  on  algorithms  which  are 
efficient  when  the  arrival  rate  to  any  one  station  is 
small.  Moreover,  the  performance  achieved  in  the  infinite 
station  case  is  a  lower  bound  to  the  performance  in  the 
finite  case.  Vith  a  finite  number  of  stations,  when  a 
station  has  more  than  one  packet  waiting  for  transmission 
it  can  pretend  to  be  two  separate  stations,  one  for  each 
packet.  The  effective  system  is  identical  to  that  of  an 
infinite  number  of  stations. 

3.  The  system  operates  in  units  of  time  slots.  Each  data 
packet  has  a  length  of  one  slot  and  every  transmission 


falls  exactly  within  a  single  slot!  Realistically,  a  data 
packet  mast  be  slightly  shorter  than  a  slot.  Ve  will  ignore 
this  fact  since  it  has  no  effect  on  the  types  of  algorithms 
used.  Also,  precise  synchronization  of  station  clocks  is 
necessary.  This  constraint  is  not  particnlarly  difficult  to 
satisfy  and  will  not  be  considered  further. 

4.  Feedback  is  instantly  received  at  the  end  of  each  slot.  In 
particular,  ternary  feedback  of  the  following  sort  is 
assumed. 

i)  If  no  packets  are  sent  in  a  slot,  then  all  stations 
identify  that  the  channel  was  idle. 

ii)  If  exactly  one  station  sends  a  packet  in  a  slot,  then 
all  stations  correctly  receive  that  packet  and 
identify  that  there  was  a  successful  transmission. 

iii)  If  two  or  more  stations  transmit  packets  in  a  slot, 
then  a  collision  occurs.  No  data  is  received,  though 
all  stations  learn  that  a  collision  has  occurred. 

Thus,  in  our  model,  stations  attempt  to  send  their  packets  based  on 
the  results  of  the  previous  slots.  The  best  of  these  collision 
resolution  strategies  is  known  as  splitting,  which  we  describe  by  the 
algorithm  below.  Note  that  in  this  algorithm,  enable(t,6)  is  a  function 
which  enables  the  interval  [t,t+6)  of  the  arrival  axis  and  returns  the 
corresponding  feedback  .  That  is  to  say,  all  packets  that  were  created 
in  the  time  interval  [t,t+6)  are  transmitted  and  we  learn  whether  there 
were  zero,  one,  or  more  than  one  packet  in  the  interval.  Also,  we 
define  t  such  that  we  know  that  all  packets  that  arrived  before  time  t 


have  been  successfully  transmitted.  The  backlog  of  the  system  is  simply 
T-t,  where  [t,T)  is  the  nnezamined  arrival  interval.  The  algorithm 
begins  by  enabling  an  nnezamined  arrival  interval  [t,t+6Q>.  The 
parameter  6q  will  be  chosen  to  mazimize  throughput.  Lastly,  this 
algorithm  assumes  there  is  sufficient  backlog  such  that  an  optimal 
interval  of  length  5q  can  be  enabled.  Clearly,  when  there  is  no 
backlog,  an  interval  of  length  less  than  6q  would  be  enabled.  Of 
course,  when  there  is  no  backlog,  enabling  an  optimum  size  interval  is 
not  very  important. 


The  Splitting  Algorithm 
Begin 
6  :«  60; 
split  :»  false; 
repeat 

case  enable(t,6)  of 
0  :  begin 

t  t  +  6; 

if  split  then  5  :*  6/2; 
end; 

1  :  begin 

t  t  ♦  6; 

if  not  split  then  6  :■  6q ; 

else  split  :*  false; 

end; 

c  :  begin 

6  6/2; 

split  true; 
end; 

end;  {case} 
until  termination; 

End. 


The  essential  idea  behind  the  algorithm  is  that  whenever  there  is  a 
collision,  we  split  the  enabled  interval  into  two  half-intervals  each  of 
which  can  be  resolved  separately.  The  above  algorithm  includes  several 
improvements  to  this  basic  idea.  If  an  interval  is  split  into  two 


half-intervals  and  the  first  of  these  half-intervals  is  found  to  be 
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empty,  then  we  know  that  the  second  half-interval  most  contain  at  least 
two  packets.  Consequently,  we  iauaediately  split  the  second 
half-interval,  avoiding  the  certain  collision.  The  second  improvement 
results  from  the  following  observation.  Suppose  an  enabled  interval 
contains  a  collision.  In  addition,  when  we  split  the  interval  and  enable 
the  first  half,  we  find  another  collision.  Given  this  circumstance,  the 
conditional  distribution  of  packets  in  the  second  half-interval  is 
identical  to  that  of  an  unezamined  interval.  Resolving  this  effectively 
unezamined  half-interval  is  not  optimal  since  this  is  equivalent  to 
beginning  the  algorithm  with  an  unezamined  interval  of  length  less  than 
the  optimal  6q.  Instead,  we  treat  the  half-interval  as  part  of  the 
unezamined  arrival  azis  and  enable  a  full  interval  of  size  6q. 

Ve  define  a  collision  resolution  interval  as  the  time  from  when  we 
enable  an  interval  of  length  6q  until  the  nezt  time  we  enable  an 
interval  of  length  6 p.  Ve  define  the  throughput  as  the  ratio  of  the 
ezpected  number  of  successful  transmissions  to  the  ezpected  number  of 
slots  used  per  collision  resolution  interval.  An  analysis  of  this 
method  can  be  found  in  [2].  This  algorithm  remains  stable  for  an 
arrival  rate  X  i  .4871  packets/slot.  In  other  words,  our  mazimum 
possible  throughput  is  .4871.  This  is  achieved  by  choosing  6q  *°ch  that 
X&q  *  1.26  packets.  Interestingly,  it  has  been  shown  [1]  that  always 
splitting  intervals  precisely  in  half  is  not  optimal. 'Vhen  our  splits 
are  made  optimally,  the  mazimum  throughput  becomes  .4878  packets/slot. 
That  is,  for  X  1  .4871,  the  average  service  rate  in  a  collision 
resolution  interval  is  greater  than  the  arrival  rate.  Lastly,  it  has 
been  shown  that  stable  throughputs  of  greater  than  .58  packets/slot  are 
not  possible  [4]. 

Surprisingly  though,  if  we  modify  our  feedback  assumption  to  be  the 


following,  very  different  results  are  possible. 

4'.  If  two  or  nore  stations  transmit  packets  in  a  slot,  then  a 
collision  occnrs.  No  data  is  received,  though  all  stations 
iaaediately  learn  exactly  hov  many  packets  were  involved  in 
the  collision. 

In  systems  of  this  type,  each  station  aeasnres  the  signal  energy  in 
every  slot.  This  aeasnrement  indicates  the  nnmber  of  active  senders.  We 
will  not  consider  the  feasibility  of  this  assumption,  although  it  is  an 
issue.  Obviously,  the  possibility  of  inaccurate  feedback  can  only 
degrade  performance. 

For  systems  of  this  type,  Pippenger  [5]  has  demonstrated  that 
throughput  arbitrarily  close  to  one  is  possible.  The  improvement  in 
achievable  throughput,  particularly  in  comparison  to  ternary  feedback, 
encourages  careful  consideration  of  this  result.  Thus,  how  to  identify 
the  distribution  of  backlogged  packets  by  the  results  of  collisions 
will  be  formulated  as  a  standard  problem.  We  will  study  the 
effectiveness  of  an  obvious  solution  strategy.  In  addition  we  will 
examine  the  difficulty  of  our  problem  and  will  show  a  tradeoff  between 
complexity  and  attainable  throughput. 


Lilf# 


Problem  Formulation.  Implementation  gnd  1 


ficultv 


Before  explaining  Pippenger's  method,  ve  can  indicate  why  such  a 
result  is  possible.  Suppose  we  have  a  very  large  backlog  of  packets 
which  have  yet  to  contend  for  transmission.  Based  on  the  time  of 
creation  of  each  contending  packet,  the  backlog  can  be  viewed  as  a 
queue  over  an  interval  of  the  time  axis.  Ve  divide  this  time  interval 
into  L  disjoint  segments  of  equal  size  and  define  x^  as  the  number  of 
packets  in  segment  i.  Our  problem  then  is  to  identify  the  vector 
x  -  [xj  ...  xL]. 

Since  the  arrival  process  is  Poisson,  this  vector  can  be  viewed  as  a 
sequence  of  L  letters  from  a  discrete  memoryless  source.  Ve  wish  to 
encode  this  sequence  for  transmission  to  a  destination  using  a  code 
alphabet  consisting  of  the  results  of  collisions.  By  enabling  a  subset 
of  the  components  of  x,  that  is  by  instructing  all  stations  with 
packets  in  this  subset  to  transmit  those  packets,  we  can  generate  a 
collision.  The  destination,  which  is  just  the  set  of  stations,  learns 
the  number  of  packets  involved  in  the  collision  and  which  segments  were 
allocated.  This  can  be  viewed  as  receiving  a  code  letter.  Our  intention 
is  to  find  a  mapping  of  source  letters  into  code  letters  so  as  to  be 
able  to  identify  the  sequence  of  source  letters  by  the  reception  of 
code  letters.  This  is  just  a  source  coding  problem.  The  set  of 
components  {x^}  are  independent  Poisson  random  variables  each  with 
entropy 

m 

H(U)  ■  -  )  Pk  In  pi 
k=0 


where 


and  p  it  the  expected  nuaber  of  packets  in  each  segment.  It  is  readily 
shown  that 

H(0)  i  P2/2  +  p  -  P  In  p 

which  is  constant  for  fixed  p.  The  source  coding  theorem  then  tells  ns 
that  it  is  possible  to  generate  a  code  snch  that  h,  the  average  length 
of  the  code  words  per  source  letter,  obeys 

H(U)  #  H(tJ)  1 

-  i  n  <  -  +  - 

In  N  In  N  L 

where  N  is  the  size  of  the  code  alphabet.  So,  N  is  simply  the  number  of 
packets  in  the  backlog.  Note  that  as  L  approaches  infinity,  N  approaches 
infinity  implying  that  n  goes  to  zero,  suggesting  that  there  exist 
codes  for  which  arbitrarily  high  throughputs  are  possible. 

However,  this  observation  does  not  show  that  there  exists  such  a 
code  that  can  be  implemented  by  data  packet  collisions  created  by  the 
group  of  stations  acting  in  concert.  The  difficulty  is  finding  a 
suitable  assignment  of  source  sequences  to  code  words  because  our  set 
of  code  words  is  restricted  to  be  the  results  of  packet  collisions.  Our 
code  would  be  described  as  an  algorithm  executed  locally  at  each 
station.  This  algorithm  would  indicate  which  segments  will  be  enabled 
in  a  slot  given  the  results  of  the  previous  slots. 

However,  Pippenger  shows  that  we  csn  expect  a  properly  chosen 
random  code  to  be  suitable.  To  identify  [xj  •••  xl! #  random  collisions 
are  generated  to  gain  sufficient  information.  In  particular,  to 
identify  the  total  number  of  contending  packets,  the  entire  interval  is 
initially  allocated;  that  is,  all  stations  with  packets  attempt  to 
transmit.  From  this  information,  L,  the  number  of  equal  size  disjoint 
segments  can  be  chosen  appropriately.  Pippenger  shows  that  it  is 
possible  to  identify  the  number  of  contending  packets  in  each  of  these 


segments,  tbst  is  the  vector  Ixj  ...  z^,] ,  in  I  m  0(L/log  L)  number  of 
steps  by  sllocsting  esch  segment  with  probsbility  1/2  in  every  slot. 
After  identifying  the  vector  x,  s  second  stage  is  necessary.  In  this  - 
stage,  those  segments  which  hold  more  than  one  packet  are  resolved  by 
executing  the  splitting  algorithm  on  each  separate  segment.  The 'reason 
this  scheme  achieves  high  throughput  is  that  in  the  limit  of  large  L, 
the  ratio  K/L  approaches  zero  and  the  number  of  slots  in  the  second 
stage  necessary  to  send  the  N  packets  approaches  N.  Although  this 
method  seems  quite  reasonable,  Pippenger  doesn't  suggest  any  method  to 
determine  this  distribution  of  packets  in  the  segments  from  the  results 
of  the  random  allocation  process. 

The  problem  of  identifying  this  distribution  can  be  set  up  in  the 
following  way.  Once  again,  the  segments  of  the  time  axis  are  be 
numbered  1  through  L.  Representing  the  number  of  contending  packets  in 
segment  i  by  X£,  the  L  dimensional  vector  x  completely  describes  the 
distribution  of  packets  in  the  segments.  If  we  have  tested  K  different 
subsets  with  yj  packets  in  the  j**1  subset  then  the  results  of  the 
various  collisions  are  completely  described  by 
Ax  -  y 

where  A  is  an  I  x  L  dimensional  0,1  matrix  with  K  <  L  such  that  a^j  -  1 
implies  that  segment  j  was  enabled  during  collision  i.  Our  problem  then 
becomes : 

Find  x  such  that 

Ax  -  y 
x,y  2  0 
x,y  integer 

A  :  C  x  L  dimensional  0,1  matrix  with  K  <  L. 


le  will  call  this  the  ideatif ication  problem.  The  A  matrix  in  this 
formulation  is  the  result  of  an  algorithm  executed  locally  by  each 
station.  This  algorithm  decides  which  segments  to  enable  in  each  slot. 

Of  course,  these  decisions  may  be  based  on  the  outcomes  of  the. earlier 
collisions.  This  formulation  admits  several  interpretations.  First  is 
the  distributed  source  coding  problem  described  above.  A  second  view  is 
that  this  is  just  an  integer  linear  program  without  an  objective 
function.  Note,  however,  that  there  may  be  some  advantage  in  using  a 
particular  objective  function.  Lastly,  this  formulation  can  be  viewed 
as  an  estimation  problem.  Through  an  observation  matrix  A,  we  observe  a 
vector  y.  From  this,  we  would  like  to  deduce  the  vector  x,  which  has  a 
known  a  priori  probability  distribution.  For  the  remainder  of  the 
discussion,  we  will  use  whichever  problem  description  happens  to  be  most 
convenient  in  any  particular  instance. 

Since  A  can  be  nearly  any  0,1  matrix,  we  will  show  that  the 
recognition  version  of  the  above  problem  is  NP  complete.  The 
recognition  version  of  the  problem  is: 

Does  there  exist  x  such  that 
Ax  -  y 
x.y  1  0 
x.y  integer 

A  :  K  x  L  dimensional  0,1  matrix  with  K  <  L. 

Ve  will  first  show  that  a  restricted  version  of  our  recognition 


problem  is  NP  complete 


Theorem:  The  following  problem  is  NP  complete. 

Does  there  exist  x  such  that 
Ax  -  y 
x  e  {0.1>L 
y  1  0 
y  integer 

A  :  K  x  L,  such  that  a$j  e  {0.1} 

Proof:  Clearly,  this  problem  is  in  NP  since  given  any  'yes'  instance  x 

we  can  readily  check  that  the  constraints  are  satisfied.  The  NP 
complete  problem  that  we  will  transform  to  our  problem  is  the 
satisfiability  problem  which  can  be  stated  in  the  following 

way: 

Given  m  clauses,  C^,  ...  , Cn  involving  the  Boolean  variables 
X£ ,  ...  ,xn  ,  is  the  formula  F(x)  in  conjunctive  normal  form, 
F(x)  -  Cj(x)  •  C2(x)  •  ...  •  (^(x)  satisfiable. 

Let  c±  equal  the  number  of  literals  in  Cj.  Replace  every 
occurence  of  x j ,  the  negation  of  x j .  with  z  j .  For  each  clause 
Cj,  we  require  that 

cA-l 

}  x  +  }  1  +  1  *ij  *  ci  i  “  1  ...  m 

xeCj  zeCj  j*l 
Ve  also  add  the  constraints 

xj+Zj-1  j«l...n 

xj  ,  zj  ,  Sij  s  {0,1} 

These  constraints  can  be  written  in  the  desired  form 


where  z  represents  [z  z  s]T.  Since  ell  the  ysrisbles  ere 
restricted  to  be  zero  or  one*  onr  first  set  of  constraints 
implies  that  at  least  one  literal  of  each  clanse  must  be  trne. 
Oar  second  set  of  constraints  simply  requires  that  either  zj 
or  zj*  bnt  not  both,  mast  be  trne.  A  eolation  [z  z  s]  to  the 
transformed  problem  gives  an  z  which  satisfies  F(z).  In 
addition,  an  z  satisfying  F(z)  immediately  implies  a  solution 
(z  z  s]  to  our  transformed  problem  with  z  *  I  and  s  chosen 
trivially  to  satisfy  the  constraints.  The  transformed  problem 
has  n  +  m  equations  and  fewer  than  2n(m  +  1)  variables.  Hence, 
we  have  constructed  a  polynomial  time  transformation  and  the 
problem  is  NP  complete. 

% 

Note  that  requiring  z  a  (0,1}**  is  a  restriction  on  our  original 
recognition  problem  which  allowed  z  to  belong  to  the  non-negative 
integers.  Consequently,  our  original  recognition  problem  is  NP  complete 
as  well. 

Carionsly,  in  the  contezt  of  oar  data  packet  problem,  the 
recognition  problem  as  stated  has  a  trivial  solution.  That  is,  since  we 
have  enabled  the  segments  to  create  the  collisions  described  by  Az  *  y, 
we  know  perfectly  well  that  there  is  an  z  satisfying  Az  *  y  and  so  the 
answer  to  the  recognition  problem  would  always  be  'Yes,  there  is  such 
an  z.'  Perhaps  then,  our  identification  problem  is  significantly  easier 
than  this  recognition  problem.  As  we  shall  demonstrate,  this  is  not  the 


Theorem:  The  identification  problea  is  NP  complete. 

Proof :  Suppose  the  contrary  so  there  exists  a  polynomial  time  algorithm 
for  onr  identification  problea  which  finds  a  solution  x: . 
satisfying  Ax  *  y  bat  only  for  those  instances  which  we  know 
beforehand  to  be  'yes'  instances.  In  this  case,  the  algorithm 
Bast  find  sach  a  solution  x  for  every  instance  of  the  problea 
which  is  a  'yes'  instance.  Of  coarse,  the  algorithm  must  not 
find  a  solution  x  for  a  'no'  instance  of  the  problea. 
Consequently,  for  a  'no'  problem  instance,  either  oar  algorithm 
returns  with  'There  is  no  snch  x.'  or  onr  algorithm  does  not 
terminate.  However,  oar  algorithm  is  assumed  to  run  in 
polynomial  time  and  so  there  mast  exist  a  polynomial  upper  bound 
to  the  amount  of  time  necessary  to  identify  a  solution  x.  Thus, 
if  our  running  time  were  to  exceed  this  bound,  we  can  conclude 
that  this  is  a  'no*  instance.  This  implies  that  our  polynomial 
time  algorithm  solves  the  NP  complete  recognition  problem,  which 
is  a  contradiction. 

In  short,  knowing  that  an  instance  of  the  problem  is  a  'yes' 
instance  does  not  make  identifying  a  particular  solution  any  easier. 

Our  task  to  find  a  distribution  of  packets  x  such  that  Ax  *  y  is  as 
difficult  as  the  recognition  problem.  Moreover,  our  proof  has  shown  us 
that  looking  for  zero  -  one  data  packet  distributions  is  no  easier  than 
looking  for  non-negative  integer  distributions.  Also,  we  can  place 
additional  restrictions  on  the  A  matrix  and  the  problea  will  still  be 
NP  complete.  For  example,  auppose  we  had  transformed  the 
3-satisfiability  problem,  the  satisfiability  problem  in  which  each 


clause  contains  three  literals.  The  resnlting  transformation  would  have 
no  aore  than  five  variables  in  each  row  constraint.  In  terns  of  onr 
data  packet  problea,  this  would  correspond  to  the  restriction  that  no 
aore  than  five  segments  can  be  enabled  in  any  collision.  This 
'simplified*  problem  is  still  NP  complete. 

The  NP  completeness  result,  though  disheartening,  is  soaeuhat 
misleading.  This  analysis  has  overlooked  the  fact  that  ve  are  allowed 
to  choose  our  A  matrix.  That  is,  we  can  specify  which  segments  are  to 
be  enabled  in  each  information  gathering  collision.  There  may  very  well 
be  A  matrices  with  some  special  structure  for  which  our  identification 
problem  becomes  relatively  easy,  yet  for  which  high  throughputs  are 
still  possible.  This  point  remains  unresolved  and  suggests  several 
other  questions. 

Is  there  a  sensible  way  to  choose  our  observations  dynamically? 
Perhaps,  given  the  results  of  the  previous  collisions,  ve  can  specify 
the  segments  to  be  enabled  in  the  next  slot  in  some  beneficial  manner. 
Typically  though,  what  one  perceives  to  be  sensible  is  to  divide  the 
problem  into  several  smaller  subproblems.  However,  this  corresponds  to 
splitting,  which  eliminates  the  possibility  of  asymptotically  high 
throughput. 

An  example  of  this  approach  would  be  to  use  the  collision 
information  to  perform  optimal  splitting,  which  has  been  considered  in 
[6].  Suppose  an  interval  of  the  arrival  axis,  which  we  will  label  [0,1) 
without  loss  of  generality,  is  known  to  contain  k  packets.  This 
strategy  simply  enables  an  interval  [0,a)»  0  <  a  £  1.  We  then  learn  the 
number  of  packets  in  [0,a)  as  well  as  the  number  of  packets  in  [a,l]. 
Resolving  packets  in  each  of  these  sub-intervals  is  the  same  problem  as 
resolving  packets  in  the  interval  [0,1).  Moreover,  for  splitting 


algorithas,  optiaal  resolution  of  the  sob- interval  is  also  optiaial  for 
the  entire  interval.  Thus,  we  would  continue  to  split  these 
sob-intervals  ontil  all  k  packets  are  transaitted.  This  strategy  is 
readily  analyzed. 

Denote  N(a,b)  as  the  nnaber  of  packets  in  the  tine  interval  [a.b) 
and  rk  as  the  ainiaoa  expected  nnaber  of  transnissions  necessary  to 
resolve  an  interval  known  beforehand  to  contain  k  packets.  Clearly, 
r o  “  0  since  an  interval  containing  zero  packets  reqnires  no  transnissions 
to  resolve.  Sinilarly,  rj  ■  1.  Given  an  interval  with  two  or  aore 
packets,  and  a  split  [0,a),  we  can  write  down  the  following  recorsion: 

1  +  rk  if  N(O.n)  -  0 

rk  «  ^1  +  rk_i  if  N(0,a)  ”  1 

1  +  ri  +  rk_j  if  N(O.a)  «  j  2  2 

Clearly,  for  every  valne  of  k,  the  nnaber  of  packets  in  an  interval, 
there  is  an  optiaal  split  [0,ok)  and  rk  is  a  fonction  of  ak.  Denoting 
pn(j.a)  *  p*l  N(O.a)  -  j  I  N<0.1)  -  n  ] 

-  oJ  (1  -  a)n-J 


we  can  write 

ra(aa)  “  I1  +  tn(on) ]Pn(0,nn)  +  [1  +  rn_j (on_i> ]P„(1 ,an) 


^  [  1  +  r j ( sj )  +  rn_j (aj )  jpn(j.an)  n  2  2 


j-2 


Solving  for  rn(aa),  we  find  that 


rnian)  •  1  -  Pn(l,o„)  +  ^  pn(J»«n>  +  pn*n_J»an>  1 

j-0  L  J 


1  -  Pn(n,an)  +  Pn(0 ,an) 

Given  {r©.  . ..  »ra_i)  and  {ao.  ...  ,on-i),  we  can  ainiaize  over  on  to 


find  ra.  In  particolar,  the  first  few  terns  are: 


2  3.000  0.500 

3  4.787  0.412 

4  €.432  0.343 

5  8.485  0.288 

6  10.342  0.249 

7  12.203  0.220 

8  14.067  0.197 


Now.  suppose  we  have  an  unexamined  interval  [a.b).  First,  we  would 
enable  tbe  entire  interval  to  find  k,  the  total  number  of  packets.  If 
k  4  1,  then  we  are  all  done.  Otherwise,  we  would  expect  to  need  an 
additional  r^  transmissions  to  send  all  of  the  packets.  So,  our 
expected  number  of  transmissions  would  be 


(i  mil 

1  \l  ♦  rk  if  k  >  1 


Ve  can  define  our  throughput  as  the  ratio  E[k]/E[R^].  Note  that  E[k]  • 
k(b  -  a)  where  the  time  difference  (b  -  a)  is  measured  in  slots.  All 
that  remains  is  to  choose  (b  -  a)  to  maximize  our  throughput.  It  turns 
out  that  choosing  our  arrival  interval  so  that  expected  number  of 
packets  equals  1.88  is  optimal.  The  resulting  throughput  achieved 
is  .5316  packets/slot. 

This  is  a  modest  improvement  over  standard  splitting  algorithms 
using  ternary  feedback.  Why  these  systems  do  not  perform  better  can  be 
easily  identified.  Suppose  in  the  interval  [0,1)  we  know  there  are  k 
packets.  Then,  we  know  that 

Pr [  N(0,a)  -  1  I  N(0,1)  •  k]  -  ka(l  -  o)k_1  0  4  a  4  1 


This  probability  is  maximized  by  choosing 
o  *  1/k 

yielding 

P(k)  -  (1  -  1/k)*'1 

Note  that 
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and  that 


lia  P(k)  -  e*1 

k*» 

Except  when  there  is  a  single  packet  in  the  interval,  the  probability  of 
a  snccessfnl  transmission  is  always  less  than  one  half.  Essentially, 
there  is  no  benefit  in  waiting  for  a  large  backlog.  These  strategies 
serve  only  to  avoid  wasting  slots  in  those  instances  when  an 
unexpectedly  large  number  of  packets  arrives  in  some  interval.  In  fact, 
suppose  the  feedback  were  restricted  so  that  more  than  N  packets  in  a 
collision  is  identifed  as  such,  rather  than  by  the  explicit  number  of 
packets  involved.  In  the  analysis  [6],  it  has  been  shown  that  for  N  1  5, 
there  is  only  a  negligible  loss  in  throughput  using  this  limited 
feedback.  This  is  not  suprising,  since  the  optimal  solution  of  this 
type  looks  at  intervals  which  are  expected  to  contain  relatively  few 
packets.  Moreover,  this  emphasises  the  importance  of  generating  large 
collisions. 


Another  approach  to  identifying  the  packet  distribution  fron  the 
collision  results  is  to  simply  conduct  a  search  for  the  distribution 
which  sstisfies  the  collision  requirements.  If  it  were  possible  to 
achieve  reasonably  high  throughputs  without  having  to  construct  an 
extraordinarily  large  problem,  then  searching  becomes  a  viable 
technique.  Of  course,  imposing  some  special  structure  on  our 
observation  matrix  may  allow  reduced  search  effort  for  equivalent 
effect.  However,  rather  than  address  the  question  of  special 
structures,  we  will  concentrate  on  the  issue  of  how  large  a  problem  we 
must  consider  in  order  to  attain  a  specified  throughput,  irrespective 
of  any  special  structure. 

The  size  of  our  problem  is  a  function  of  the  number  of  packets  in 
the  interval  we  are  considering.  Clearly,  a  smaller  number  of  packets 
can  distribute  themselves  in  a  smaller  number  of  ways,  resulting  in  a 
simpler  search.  So,  our  intention  will  be  to  lower  bound  the  number  of 
packets  required  in  the  backlog  in  order  to  achieve  a  certain 
throughput.  Pippenger's  proof  showed  that  we  can  obtain  high 
throughputs  by  considering  very  large  problems.  Bowever,  the 
possibility  that  modest  sized  formulations  might  do  reasonably  well  was 
not  eliminated.  This  issue  will  now  be  considered. 

Ve  will -assume  we  have  an  interval  containing  N  packets.  This 
involves  no  loss  of  generality.  Since  knowledge  of  N  can  only  increase 
our  throughput,  any  upper  bound  to  throughput  under  this  assumption  is 
still  a  valid  upper  bound  when  N  is  unknown.  Moreover,  if  we  were  to 
implement  a  collision  resolution  algorithm,  it  seems  likely  that 
enabling  the  entire  interval  expressly  to  learn  N  would  be  our  first 


step.  Te  thea  divide  oar  large  iatervsl  iato  L. snail  segaeats.  L  is  e 
psrsaeter  to  be  optiaized  later.  L  vill  be  a  function  of  N  although  at 
this  point,  it  is  aot  clear  what  vill  be  an  appropriate  choice.  Defiae 
z  to  be  the  L  diaensioaal  vector  snch  that  z^  represent*  the  aaaber  of 
packets  ia  segment  i  and  I  xj  ■  N. 

Our  collision  resolution  algorithm  vill  have  tvo  steps.  First,  ve 
vill  generate  collisons  of  the  fora 
y  *  Ax 

vhere  A  is  a  K  x  L  dimensional  matrix  snch  that  a±j  a  {0,1}.  These  K 
collisions  should  serve  to  identify  the  vector  [x^  ...  X|J .  Second, 
every  segaent  i  containing  X£  >  0  packets,  ve  vill  resolve  separately 
by  splitting.  If  this  second  step  uses  J  slots  to  transmit  successfully 
the  N  packets  described  by  x,  then  our  throughput  vill  be 

Number  of  packets 

t  *  - 

Expected  number  of  transmissions 

N 

t  -  - 

K  +  J 

Ve  vill  find  lover  bounds  for  K  and  J,  so  ve  can  upper  bound  throughput. 

Our  bounding  arguments  vill  sssnae  that  no  packets  are  snccesfully 
transmitted  during  the  the  first  stage.  This  assnaption  implies  that  the 
A  aatrix  anst  not  be  sparse.  In  our  earlier  coding  argument,  ve  shoved 
that  ve  need  our  alphabet  of  output  'code  letters'  to  be  large.  If  A 
vere  sparse,  then  our  output  y  would  consist  of  snail  integers  and  the 
size  of  our  alphabet  vonld  be  snail.  Thus,  our  assumption  is 
reasonable.  Moreover,  any  strategy  designed  to  achieve  high  throughput 
anst  generate  collisions  involving  large  nuaber  of  transmitters. 
Consequently,  every  high  throughput  algorithm  can  be  described  by  this 
tvo  step  approach,  vhere  step  one  generates  large  collisions  snd  step 


two  transmits  packets  vita  very  few  collisions. 

Ve  will  use  the  Sonrce  Coding  Theorem  to  derive  a  lower  bound  for  I. 
The  Sonrce  Coding  Theorem  says  [3]: 


Let  a  discrete  memoryless  sonrce  have  finite  entropy  B(X) 
and  consider  a  coding  from  sequences  of  L  sonrce  letters 
into  sequences  of  K  code  letters  from  a  code  alphabet  of 
size  D.  Only  one  source  sequence  can  be  assigned  to  each 
code  sequence  and  we  let  Pe  be  the  probability  of  occurrence 
of  a  source  sequence  for  which  no  code  sequence  has  been 
provided.  Then,  for  any  6  >  0  if 
K/L  2  [H(X)  +  6]/log  D 

Pe  can  be  made  arbitrarily  small  by  making  L  sufficiently 
large.  Conversely,  if 

K/L  1  [H(X>  -  b)f log  D 

then  Pe  must  become  arbitrarily  close  to  1  as  L  is  made 
sufficiently  large. 


Ve  can  view  the  sequence  {yj  ...  yg)  as  a  sequence  of  K  code  letters 
such  that  yj  a  {0,1,  ..  ,D-1}.  Since  our  arrivals  are  Poisson  and  our 
segments  do  not  overlap,  we  note  that  i  “  1..L}  can  be  viewed  as  a 
set  of  L  independent  source  letters.  The  entropy  of  this  sequence  is 
LH(X)  -  H(Xi  ...  XL) 

-  H(Xl) 

Since  conditioning  always  redness  entropy,  we  know  that 

B(XL)  2  h(xHn) 

where  B(XHn)  denotes  the  entropy  of  the  sequence  zj...zl  conditional  on 


v*. 


L 

}  *i  "  N 
i-1 

Consequent 1 y,  the  second  part  of  the  Source  Coding  Theorem  implies 
that  if 

K  log  D  <  H(Xl|N) 

then  Pe  must  approach  1  as  L  becomes  sufficiently  large.  We  will  call 
this  the  entropy  bound.  That  is.  if  the  entropy  bound  holds,  then  the 
probability  of  occurrence  of  a  source  sequence  for  which  no  code 
sequence  has  been  provided  approaches  one  for  L  large.  In  the  context 
of  our  original  problem  this  allows  us  to  make  the  following  argument. 
Suppose  we  generate  a  particular  collision  matrix  A  and  to  every  vector 
x  in  the  set 

X  -  {x  I  I  xj  «  N,  0  i  *£  i  N,} 

we  assign  the  code  word  y  =  Ax,  if  the  code  word  y  has  not  been  assigned 
previously.  Otherwise,  if  y  has  already  been  assigned,  then  x  is 
labelled  unassigned.  So,  if  the  entropy  bound  is  true,  then  the 
probability  of  occurrence  of  an  onassigned  source  sequence  x  approaches 
one  for  L  sufficiently  large.  Equivalently,  we  can  say  the  probability 
of  occurrence  of  a  vector  y  for  which  there  exists  more  than  one  vector 
x  e  X  such  that  y  ■  Ax,  approaches  one  for  large  L.  In  short,  we  have 
the  following  result: 

Theorem:  If 

C  log  D  <  H(Xl|N) 

then  the  probability  that  we  will  not  be  able  to  decode  x  given 
y  ■  Ax  approaches  1  for  L  sufficiently  large. 

To  make  use  of  this  claim,  we  must  identify  D  and  B(X^|N).  To  find 
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if 

M 


the  code  alphabet  size  D,  we  observe  that 


L  L 

y±  “  5  XJ  *  1  XJ  “  N 


7i  a  (0,  ...  ,N} 


which  implies 


D  -  (N  +  1) 


To  evaluate  H(XMn)  is  wore  difficult.  Define  the  length  of  oar 


initial  enabled  interval  to  be  p  slots  long  (p  is  not  usually  an 


integer)  and 


P[XLlN]  -  Pr[*i“ni  ...  *L”nL  I  2*i  *  NJ 


*  Pr[*i«nj  ...  XL“nL  »  Ix$  “  N]/Pr[  Zxj  *  N  ] 


(Xp/L)“l  e“(X,l/L)/n1!  •  ...  •  (Xp/L)aL  e'Ua/L)/nL! 


<Xp)N  e“x*»/N! 


(njl*  ...  ’UlD 


The  entropy  becoaies 


N!  N! 

H(Xl|N)  -  -  )  -  log  - 

nj..PL  L^(nil  ...  n^l )  L^Cnj!  ...  n^!) 


■  Nlog  L  -  log(N!)  +  E[ln  n j ! ] ; 
Since  n j !  2  1.  for  all  j. 


Ella  a,!]  2  0 


In  addition. 


Nil  nN 


implying  that 


H(Xl|N)  2  N  log(L/N) 


jnerating  collisions  nsing  sons  A  matrix.  For 
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every  source  sequence  xL,  we  generate  E  collisions  and  we  know  that  in 
order  to  decode  x  from  y  =  Ax  every  tine,  we  must  have 


E  2  H(X^ j N) / ( log  D) 

2  Nlog(L/N)/log{N+l)  . 

Ve  now  can  evaluate  the  second  stage  of  our  collision  resolution 
process.  Suppose  we  are  handed  a  particular  distribution  of  N  packets 
into  oar  L  segments,  [x^  ...  xjJ .  Segments  containing  a  single  packet 
are  transmitted  directly.  Segments  containing  more  than  one  packet  will 
be  resolved  using  the  optimal  splitting  algorithm  analyzed  earlier. 

Ve  would  like  to  determine  J,  the  expected  number  of  transmissions 
necessary  to  send  our  N  packets  given  the  packet  distribution  x.  Denote 
r^  as  the  expected  number  of  transmissions  needed  to  resolve  a 
particular  segment  given  that  we  have  already  identified  that  there  are 
k  packets  in  the  segment.  Happily,  is  this  the  same  r^  that  we 
determined  earlier.  Given  the  distribution  x,  the  number  of  slots 
necessary  to  resolve  a  particular  segment  is  independent  of  the  number 
of  slots  necessary  to  resolve  any  other  segment.  Also  the  number  of 
packets  in  a  particular  segment  is  binomially  distributed  and  is 
described  by: 

PN(k)  -  Pr[  zj  -  k  I  Zxi  -  N  ] 

Prl  xj  ■  k,  ^  *i  “  j 

_ _ _ 

Pr[  I  x£  -  N  ] 

[U|i/L)k  e-<kM/L>/k!j[Un(l  -  l/L))N-k  e^U  ~  l/U/(N-k)l] 

Uu)N  e-*P  /  N! 

Pfc(k)  -  (£)  <1/L)k  (1  -  l/L)N-k 

These  facts  allow  us  to  write 
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'  ■  >1  Pn<‘> 

k*l 

The  values  rk  ere  summarized  here: 

ro  -  0 

rl  -  1 
T2  "  3 

rk  2  1.874k  -  1 

The  inequality  for  k  2  3  is  very  tight. 
Applying  this  result,  ve  have 


for  all  k 


J  2  L  J  (1.87k  -l)PN(k)  +  PN(0)  +  .13Pn(1) 


J  2  L  [  1 . 87 (N/L)  -  1  +  (1  -  1/L)n  +  .13N(1/L)  (1  -  1/L)N'-1  ] 

We  will  define  a  such  that  L  =  oN,  yielding 

J/N  2  1.87  -  a  +  a(l  -  l/aN)N  +  .13(1  -  l/aN)*-l 

Asymptotically,  this  leads  to  the  following  definition 

r (a)  -  1  in  1.87  -  a  +  a(l  -  l/aN)N  +  .13(1  -  l/aN)N~l 

N— ”  L  J 

-  1.87  -  a  +  (o  +  .UJe-1/0 


For  large  values  of  N,  given  L/N  *  a,  we  would  expect  to  need  Nr(a) 
slots  to  send  successfully  all  N  data  packets.  The  function  r(a)  is 
convex  and  non-increasing.  Suppose,  we  were  able  to  determine  the 
distribution  Ixj  ...  x^]  without  generating  a  single  collision.  In  this 
unlikely  circumstance,  our  throughput  for  all,  would  be 
t  -  N/J  1  l/r(a) 


iffl 

i 

1 
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If  we  those  a  ■  1,  that  it  L  ■  N,  we  would  have  t  i  l/r(l)  *  .74  .  To  achieve  a 
particular  throughput,  we  must  choose  a  sufficiently  large  so  that  the 
likelihood  of  collisions  in  the  second  stage  is  suitably  snail.  Obviously, 
since  ve  noraally  nust  identify  [x^  ...  xjJ ,  which  is  not  an  easy  task,  our 
actual  attainable  throughput  must  be  significantly  lower. 

Ia  addition,  note  that 

lin  r(a)  -  1 

a-® 

For  L  very  much  larger  than  N,  only  N  slots  are  necessary  to  transmit 
all  N  packets.  Given  the  distribution  x,  we  see  that  the  likelihood 
that  a  single  segment  contains  more  than  one  packet  becomes  very  small 
for  large  L.  This  indicates  that  our  bound  on  J  is  asymptotically 
tight. 

We  now  can  combine  the  results  of  our  two  stages. 

N 

t  «  - 

K  +  J 


For  t  close  to  oae,  we  cm  continue  to  lower  bound  N,  giving  en 
indication  of  its  asymptotic  growth.  Substituting  onr  formula  for  r(a) 


into  onr  above  bound,  we  have 
In  a 


t  i 


In  N+l 


+  1.87  —  a  +  (a  + 


.isle"1/0  ] 


-1 


Noting  that 

e-l/a  >  l  -  i/a  +  l/2a2  -  l/6o3 

we  learn 

In  a  7~2 


t  i  (  +  1  +  .37/a  -  .232/a2  1 

L  In  N+l  J 


In  N+l 

Suppose  we  choose  a*  snch  that 

1  +  .37/a*  +  .232/(a*)2  -  1/t 

Solving  this  equation  for  a*,  we  have  for  t  1  .87 
a*  -  .  185p [1  +  (1/2H1  -  6.78/p)1/2] 

where  p  *  t / ( 1  -  t).  Note  that  in  order  to  achieve  a  throughput  t,  we  bus 
a  2  a*.  Since  1/t  2  1,  we  can  say 

log  a*  1_1 


t  i 


.il 

L  log  N+l  J 


This  implies 

N+l  2  e<ln  °*>P 
2  epln(.185p) 


“  [p/5.4]P 


t 

1-t 


t  2  .87 


Along  with  figure  1,  this  describes . the  tradeoff  between  complexity 
and  high  throughput  for  this  problem.  However  it  is  a  very  discouraging 
tradeoff.  For  example,  we  note  from  figure  1,  that  to  obtain  a 
throughput  of  .93,  we  must  consider  a  backlog  containing  at  least  e®* 


packets.  Moreover,  there  is  no  guarantee  that  considering  soch  a  large 


nnaiber  of  packets  will  yield  the  desired  thronghpnt.  Vhen  we  remind 


ourselves  that  the  underlying  stage  one  problem  to  identify  the 


distribution  x  is  NP  complete,  the  size  of  the  problem  generated  seems 


tally  daunting.  In  particular,  we  can  evaluate  the  size  of;the  set 


over  which  we  must  search.  If  we  are  simply  looking  for  solutions  x  to 


y  ■  Ax,  then  the  search  time  would  be  proportional  to  the  size  of  the 


X  -  {x  :  Zxi  -  N} 


Ve  caa  lower  bound  the  cardinality  of  this  set. 


To  verify  this  claim,  note  that  is  the  number  of  ways  of  choosing  R 
components  of  x  to  be  nonzero.  Also,  _  j)  equals  the  number  of  ways 
to  distribute  N  packets  in  the  X  nonzero  components  x^,  ...  ,x^R.  To 


see  this,  observe  that  we  can  equivalently  specify  the  sequence 


*kx  •  *kx  +  *k2 


♦  ...  + 


This  is  a  strictly  increasing  sequence  such  that 


*kj  +  •••  +  *kR  "  N 


However,  the  first  &  -  1  terms  of  the  sequence  can  be  chosen  without 


(N  _ 

g  _  j)  possible  ways. 


Note  that 


(ED-  (R/N,(») 2  <i/n,(r) 


As  a  result. 


N 

111  1  (1/N)  J  R  R  “  (1/N)  (L  N  N) 

E«I 

-  (1/N)  ((1J°)N) 

When  we  remind  ourselves  of  how  rapidly  N  diverges,  we  have : shown 
that  any  search  strategy  that  achieves  high  throughput  would  require 
truly  enormous  number  of  computations.  Even  if  we  were  to  find  an 
special  observation  matrix  that  makes  solving  the  stage  one  problem 
relatively  easy,  we  would  still  have  an  extraordinarily  large  'easy* 
problem.  In  addition,  no  such  special  matrix  has  yet  to  be  identified 


Our  lover  bound  on  the  number  of  packets  is  valid  only  for 
strategies  that  can  be  described  by  our  problem  formulation.  In 


particular,  collision  resolution  does  not  require  identifying  the 
number  of  packets  in  every  small  segment  of  the  arrival  time  axis.  All 
that  is  truly  necessary  is  to  be  able  to  specify  subsets  of  the  arrival 
axis  that  contain  a  single  data  packet.  Strategies  which  solve  this 
problem  without  reverting  to  the  Ax  ■  y  segmented  problem  formulation 
have  yet  to  be  described.  Moreover,  for  any  approach  that  yields  high 
throughput,  it  would  appear  from  our  source  coding  arguments  that  a 
large  code  alphabet  is  necessary.  In  other  words,  we  must  generate 
collisions  involving  large  numbers  of  data  packets.  Although  precisely 
how  large  is  an  issue,  it  seems  clear  that  some  tradeoff  between 
complexity  and  throughput  would  still  exist.  In  short,  our  results 
suggest  that  without  doing  an  extraordinary  amount  of  work,  we  cannot 
hope  to  do  significantly  better  than  the  throughput  of  .53  attained  by 
the  optimal  splitting  algorithm. 
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